Scaling up Group Closeness Maximization
نویسندگان
چکیده
Closeness is a widely-used centrality measure in social network analysis. For a node it indicates the inverse average shortest-path distance to the other nodes of the network. While the identification of the k nodes with highest closeness received significant attention, many applications are actually interested in finding a group of nodes that is central as a whole. For this problem, only recently a greedy algorithm with approximation ratio (1−1/e) has been proposed [Chen et al., ADC 2016]. Since this algorithm’s running time is still expensive for large networks, a heuristic without approximation guarantee has also been proposed in the same paper. In the present paper we develop new techniques to speed up the greedy algorithm without losing its theoretical guarantee. Compared to a straightforward implementation, our approach is orders of magnitude faster and, compared to the heuristic proposed by Chen et al., we always find a solution with better quality in a comparable running time in our experiments. Our method Greedy++ allows us to approximate the group with maximum closeness on networks with up to hundreds of millions of edges in minutes or at most a few hours. To have the same theoretical guarantee, the greedy approach by [Chen et al., ADC 2016] would take several days already on networks with hundreds of thousands of edges. In a comparison with the optimum, our experiments show that the solution found by Greedy++ is actually much better than the theoretical guarantee. Over all tested networks, the empirical approximation ratio is never lower than 0.97. Finally, we study for the first time the correlation between the top-k nodes with highest individual closeness and an approximation of the most central group in large complex networks. Our results show that the overlap between the two is relatively small, which indicates empirically the need to distinguish clearly between the two problems. ∗This work is partially supported by German Research Foundation (DFG) grant ME 3619/3-1 within the Priority Programme 1736 Algorithms for Big Data. 1 ar X iv :1 71 0. 01 14 4v 1 [ cs .D S] 3 O ct 2 01 7
منابع مشابه
Revisiting the Stop-and-Stare Algorithms for Influence Maximization
Influence maximization is a combinatorial optimization problem that finds important applications in viral marketing, feed recommendation, etc. Recent research has led to a number of scalable approximation algorithms for influence maximization, such as TIM and IMM, and more recently, SSA and D-SSA. The goal of this paper is to conduct a rigorous theoretical and experimental analysis of SSA and D...
متن کاملProgress in Global Surgery; Comment on “Global Surgery – Informing National Strategies for Scaling Up Surgery in Sub-Saharan Africa”
Impressive progress has been made in global surgery in the past 10 years, and now serious and evidence-based national strategies are being developed for scaling-up surgical services in sub-Saharan Africa. Key to achieving this goal requires developing a realistic country-based estimate of burden of surgical disease, developing an accurate estimate of existing need, deve...
متن کاملI/O-efficient calculation of H-group closeness centrality over disk-resident graphs
We introduce H -group closeness centrality in this work. H -group closeness centrality of a group of nodes measures how close this node group is to other nodes in a graph, and can be used in numerous applications such as measuring the importance and influence of a group of users in a social network. When a large graph contains billions of edges that cannot reside entirely in a computer’s main m...
متن کاملScaling-Up Model-Based Clustering Algorithm by Working on Clustering Features
In this paper, we propose EMACF (Expectation-Maximization Algorithm for Clustering Features) to generate clusters from data summaries rather than data items directly. Incorporating with an adaptive grid-based data summarization procedure, we establish a scalable clustering algorithm: gEMACF. The experimental results show that gEMACF can generate more accurate results than other scalable cluster...
متن کاملA new 2D block ordering system for wavelet-based multi-resolution up-scaling
A complete and accurate analysis of the complex spatial structure of heterogeneous hydrocarbon reservoirs requires detailed geological models, i.e. fine resolution models. Due to the high computational cost of simulating such models, single resolution up-scaling techniques are commonly used to reduce the volume of the simulated models at the expense of losing the precision. Several multi-scale ...
متن کامل